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ABSTRACT 


i i i hylogeny and patterns of evolution. 

Over the two decades there has been remarkable progress in resolving angiosperm phylog I patt 
dins Sous! primarily plastid molecular data sets have revealed new insights into numerous historically contentious 
problems of deep-level angiosperm phylogeny, including relationships among “basal angiosperms” (not members of either the 


eudicot or monocot clades), among clades of Mesangiospermae, 
also have provided evidence for numerous rapid radiations t 
Mesangiospermae, as well as most major core eudicot lineages 


and among major clades of eudicots. The same large data sets 
hroughout the evolution of angiosperms. The five lineages of 
» each likely arose within a narrow range of just a few million 


years. The rapid radiations in rosids (Rosidae) gave rise to angiosperm-dominated forests, which are also associated with the 
diversification of ants, beetles, hemipterans, amphibians, and most extant ferns. Ongoing phylogenetic analyses now routinely 
construct phylogenetic hypotheses encompassing thousands of taxa. Such trees enable us to take a broad phylogenetic 
perspective on character evolution, community assembly, and conservation. While the wealth of new sequence data continues 
to transform the study of angiosperm evolution, it also presents major computational and informatic challenges associated with 


the management and analysis of enormous data sets. 
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Rosidae. 


Āe Ia O 


With perhaps 400,000 extant species, the angio- 
sperms represent one of the largest terrestrial 
radiations. During the past 20 years, contributions 
from paleobotany, phylogenetics, developmental 
biology, and developmental genetics have provided 
new perspectives on the diversification of the 
angiosperms. Advances in large-scale genome 
sequencing approaches, such as rapid whole-plastid 
genome sequencing via next-generation sequencing 
technology, have enabled particularly dramatic 
progress in resolving plant relationships at deep 
levels. Here, we first highlight improvements in our 
understanding of deep-level angiosperm phylogeny. 
We then review how this robust phylogenetic 
underpinning has made it possible to pinpoint and 
date rapid divergences within the angiosperms, as 
well as to improve estimates of the timing of the 
origin of the angiosperms and major subclades 
within the angiosperms. It is now clear, for example, 
that the angiosperms are characterized by numer- 
ous, distinct rapid radiations, many of which are 
associated with co-diversification events in diverse 
lineages. Finally. we focus on some of the future 


Prospects and opportunities for angiosperm phylo- 
genetics. 
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RESOLVING ANGIOSPERM PHYLOGENY 


Early studies using DNA sequences provided a 
solid phylogenetic framework of angiosperms and 
defined the major clades (e.g., Angiosperm Phylogeny 
Group, 1998; Angiosperm Phylogeny Group II, 2003; 
Angiosperm Phylogeny Group III, 2009; reviewed in 
Chase, 2004; Judd & Olmstead, 2004; Soltis & Soltis, 
2004; Leebens-Mack et al., 2005; Soltis et al., 2005, 
2009; Chase et al., 2006; Graham et al., 2006). A 
complete review of this dynamic period is beyond 
the scope of this paper, but a few landmark papers 
illustrate the rapid progress in angiosperm phyloge- 
netics. Ritland and Clegg (1987) first suggested that 
the plastid gene rbcL was useful for phylogeny 
reconstruction in angiosperms, and by 1990, before 
the routine use of polymerase chain reaction (PCR) in 
systematics and evolutionary biology, the first papers 
appeared using rbcL to infer angiosperm phylogeny 
(e.g., Doebley et al., 1990; Soltis et al., 1990). Shortly 
thereafter, advances in PCR technology enabled a 
collaborative group of scientists to generate a 500- 
Sequence rbcL data set (Chase et al., 1993), which 
provided the first broad framework of angiosperm 
phylogeny. By the late 1990s, other large, collabora- 


eans, New Orleans, Louisiana 70148, U.S.A. 
5 Gainesville, Florida 32611, U.S.A. 


Ann. Missouri Bor. Garp. 97: 514-526. PUBLISHED ON 27 DECEMBER 2010. 


ee 


Volume 97, Number 4 
2010 


tive ventures ultimately produced 3-gene analyses of 
560 flowering plant species (Soltis et al., 1999, 2000). 
These examples illustrate well the central role of 
large-scale collaboration in realizing rapid progress in 
angiosperm phylogenetics and provide a useful 
template for addressing large-scale questions in the 
future. 

Despite the rapid progress of these early studies in 
defining the major subclades and revealing the basal 
splits in angiosperm phylogeny, relationships among 
major subclades remained unresolved. In the past few 
years, technological improvements (e.g., next-genera- 
tion sequencing) have dramatically accelerated the 
pace of DNA sequencing, permitting the construction 
of massive data sets involving thousands of nucleo- 
tides. This rapid and relatively inexpensive sequenc- 
ing has helped resolve most of the remaining 
problematic deep-level questions of relationships in 
flowering plants. Furthermore, these and other 
technological advances set the stage for building ever 
larger trees comprising thousands of terminals. 


PHYLOGENOMICS 


In less than four years, next-generation sequencing 
technologies (e.g., Roche 454 [Roche, Branford, 
Connecticut, U.S.A]; [lumina Solexa [Ilumina, Inc., 
San Diego, California, U.S.A.J; and ABI SOLID 
[Applied Biosystems, Foster City, California, U.S.A.) 
have introduced a genomic perspective to phyloge- 
netics. For example, some investigators are using these 
technologies for deep transcriptome sequencing, and 
the resulting expressed sequence tags (ESTs) have 
been used to leverage the vast and underutilized 
nuclear genome for deep-level phylogenetic inference 
(e.g., de la Torre et al., 2006; Sanderson & McMahon, 
2007; de la Torre-Bárcena et al., 2009). This approach 
is being employed by the 1KP project (an international 
consortium that will generate transcript sequences for 
1000 plant species over the next two years; G. K. Wong, 
Principal Investigator [PI], University of Alberta) as 
well as the monocot Assembling the Tree of Life project 
(AToL; T. Givnish, PI, University of Wisconsin) to 
resolve relationships across green plants and monocots, 
respectively. 

Although EST sequences generated via next- 
generation transcriptome sequencing represent a 
wealth of potential phylogenetic data, this approach 
also presents challenges for phylogenetic inference. 
For example, sampling of orthologous gene copies 
among taxa is not guaranteed; thus, both contig 
assembly and phylogenetic inference may be ham- 
pered by comparison of paralogs. Furthermore, the 
alignment of short EST fragments typically results in 
large amounts of missing data, which can complicate 
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phylogenetic analyses (Hartmann & Vision, 2007; 
Lemmon et al., 2009). Also, high rates of nuclear gene 
duplication (including whole-genome duplications) 
and loss, as well as incomplete lineage sorting, can 
confound phylogenetic inference, creating incongru- 
ence between gene tree and species tree topologies 
(e.g, Maddison, 1997). Despite these potential 
pitfalls, recent studies suggest that phylogenomic 
analyses employing EST data can be highly informa- 
tive in plants (de la Torre et al., 2006; Sanderson & 
McMahon, 2007; de la Torre-Bárcena et al., 2009; 
Burleigh et al., in press). 

Perhaps the biggest impact of next-generation 
sequencing in angiosperm phylogenetics has been 
rapid sequencing of complete plastid genomes. The 
plastid genome has long been the workhorse of plant 
systematics because of its ease of amplification, its 
relative lack of gene duplication and recombination, 
and its wealth of characters (ca. 150,000 bp) that are 
phylogenetically informative across many taxonomic 
levels. Importantly, next-generation sequencing has 
now made complete plastid genome sequencing 
routine and relatively inexpensive. The plastid 
genome is ideally suited for next-generation sequenc- 
ing because of its structural simplicity, highly 
conserved gene content and arrangement, rarity of 
repeats, and small genomic size (Raubeson & Jansen, 
2005; Jansen et al., 2005; Moore et al., 2006). Both 
the Roche 454 and Illumina Solexa sequencers have 
been successfully used to sequence plastid genomes 
(e.g., Moore et al., 2006, 2007; Cronn et al., 2008). 

With rapid technological advances, the cost of 
sequencing a single plastid genome has dropped from 
$4000 to $5000 per plastid genome in initial studies 
(e.g., Moore et al., 2006) to the point at which $100 to 
$150 for a complete plastid genome sequence is near 
at hand. At such low cost, new avenues of research 
that employ complete plastid genome data are readily 
affordable, including analyses of phylogeography and 
other population-level applications. To date, however, 
most studies using plastid genomics have focused on 
deep-level phylogenetic problems. Early studies that 
employed complete plastid genome sequencing were, 
by necessity, limited in their taxonomic sampling 
(e.g, Goremykin et al., 2003), and consequently 
produced erroneous results (see Soltis et al., 2004; 
Leebens-Mack et al., 2005). Subsequent studies of 
plastid genomes with increased taxon sampling appear 
more robust to the methods and assumptions of 
phylogenetic inference and have provided much 
insight into many of the vexing deep-level problems 
in angiosperms. For example, Moore et al. (2007) and 
Jansen et al. (2007) used complete plastid genome 
data sets to resolve relationships among the major 
clades of angiosperms, and Moore et al. (2010) used 
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complete plastid genome sequencing to resolve 
relationships among Pentapetalae sensu Cantino et 
al. (2007); other formal names within angiosperms 
also follow Cantino et al. (2007). We review these 
studies below. 

Mesangiospermae (sensu Cantino et al., 2007) 
consist of five major clades (Magnoliidae, Monocoty- 
ledoneae, Chloranthaceae, Ceratophyllaceae, and 
Eudicotyledoneae) and comprise all extant flowering 
plants other than Amborellaceae, Nymphaeales, 
and Austrobaileyales. Within Mesangiospermae, the 
Monocotyledoneae (monocots) and Eudicotyledoneae 
(eudicots) contain approximately 20% and 75% of all 
flowering plant species, respectively. Relationships 
among the five clades of Mesangiospermae have been 
difficult to determine even with data sets of up to 1} 
genes (reviewed in Soltis et al., 2005). However, 
analyses of complete plastid genome sequence data 
have resolved relationships among clades of Mesan- 
Stospermae and with generally high bootstrap support 
(Jansen et al., 2007; Moore et al., 2007). Significantly, 
complete plastid genome sequence data provided 
strong support for monocots as sister to Ceratophylla- 
ceae—eudicots (Jansen et al., 2007; Moore et al., 
2007). Furthermore, Magnoliidae and Chloranthaceae 
form a clade (albeit with low support) that is sister to 
the monocot—Ceratophyllaceae—eudicot clade (Fig. 1). 
Moore et al. (2007) estimated that the Mesangiosper- 
mae lineages, which ultimately gave rise to 99% of all 
extant angiosperm species, appeared in as few as five 
million years. For perspective, this is comparable in 
geologic timing to the rapid radiation of species of the 
moder silversword alliance on the Hawaiian Islands 
(Baldwin & Sanderson, 1998). 

Complete plastid genome sequencing has clarified 
relationships at deep levels within the Pentapetalae, or 
core eudicots excluding Gunnerales (Moore et al., 
2010). With the exception of placing Gunnerales as 
sister to all other core eudicots, earlier studies 
employing as many as five genes failed to resolve 
relationships among the major lineages of Pentapetalae 
{reviewed in Soltis et al.. 2005; Burleigh et al , 2009): 
Caryophyllales, Dilleniaceae, S 
Asteridae, Santalales, and Berberidopsidales. Using 


clades: a superrosid clade containing Saxifragales, 
y rid clade containing 
Santalales, Berberidopsidales, and Caryophyllales as 
subsequent sisters to Asteridae; and Dilleniaceae 
(Fig. 1). The splitting of these subclades also occurred 
Very rapidly, again perhaps within five million years. 
The recognition of two major clades of Pentapetalae 
(superrosids and Superasterids) has major implications 
for understanding patterns of morphological diversi- 


fication. There appear to be morphological features 
that differ between these two clades that require 
examination within this new phylogenetic context. For 
example, perianth zygomorphy and inferior ovaries 
predominate in the superasterids, whereas actinomor- 
phy and superior ovaries typify superrosids. Con- 
versely, a floral hypanthium and woodiness are more 
common features in the superrosids than super- 
asterids. 

Complete sequencing of the slowly evolving plastid 
inverted repeat (IR) region has emerged as a quick 
and inexpensive alternative to full plastid genome 
sequencing for deep-level phylogenetic inference (see 
Jian et al., 2008; Brockington et al., 2009; Wang et 
al., 2009; Moore et al., in press). The entire IR region 
can be easily sequenced using the near-universal 
angiosperm primers described by Dhingra and Folta - 
(2005). This sequencing approach successfully re- 
solved relationships within Saxifragales (Jian et al., 
2008) and Rosidae (Wang et al., 2009). For example, 
analyses of the Rosidae using the complete IR region 
supported two large clades, each with 100% bootstrap 
support, following the divergence of Vitaceae. These 
clades correspond to (1) the Fabidae, which include 
the nitrogen-fixing clade, Celastrales, Huaceae, Mal- 
Pighiales, Oxalidales, and Zygophyllales, and (2) the 
Malvidae, which include Huerteales, Brassicales, 
Malvales, and Sapindales, as well as Geraniales, 
Myrtales, Crossosomatales, and Picramniales. 

Recently, Moore et al. (in press) constructed a large 
matrix of IR sequences for over 240 angiosperm 
terminals. This tree, with far greater taxon sampling 
compared to previous complete plastid genome 
analyses (above), reveals the same pattern of relation- 
ships among major clades of eudicots. At this point, 
the one remaining unresolved deep-level phylogenetic 
question within eudicots is the placement of Dille- 
niaceae. Whereas early studies employing one to four 
genes consistently placed Dilleniaceae with Caryo- 
Phyllales, albeit with low internal support, analyses of 
83 plastid genes (Moore et al., 2010) and IR sequence 
data (Moore et al., in press) place Dilleniaceae as 
sister to all or most Pentapetalae (Fig. 2). 

While the recent abundance of plastid genome data 
has advanced our understanding of deep-level angio- 
sperm relationships, genomic data from the nucleus 
and mitochondria will be necessary to corroborate the 
phylogenetic hypotheses from plastid genome analyses. 
Because the plastid genome is a single, non-recombin- 
ing locus, evolutionary processes such as introgression 
or incomplete lineage sorting could result in incongru- 
ence between the plastid genome tree and the species 
topology. Likewise, the strong support observed in 
plastid genome analyses is not unexpected, given that 
genome-scale data sets of this size may include enough 


Volume 97, Number 4 
2010 


“Scaevi 


Campanulidae 
Epitegus Asteridae 
a Lamiidae 


Jasminum 


GU Ar- 
US 
Trochodendron 
Meliosma 
Nelumbo 


: Pe Wandina 
m 


iilan ne earl i A 
f ar y-diverging 
pet i a 


S 
s? | angiosperms 


Piper 


— Pinus 


Figure J. Phylogram of the best ML t 


that are known in angiosperm plastid genomes for 83 angios 


Numbers associated with branches are ML bootstrap support (BS) values. 
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the phylogram. 


data to reduce the effects of stochastic error. Systematic 
errors may remain, potentially resulting in misleading 
estimates of phylogeny (e.g., Phillips et al. 2004). 
Thus, in analyses of plastid genome data sets, we are 
now challenged to identify potential biases that may 
produce error. 


IMPROVED Estimates OF DIVERGENCE TIMES 


There has long been an interest in using molecular 
data to date the origin of the angiosperms (reviewed in 
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Sanderson et al., 2004). Early attempts to estimate the 
age of the angiosperms produced highly variable 
values, ranging from ca. 125 to greater than 400 
million years ago (Ma). Most of these early estimates 
also conflict with the fossil record (see Sanderson & 
Doyle, 2001; Soltis et al., 2002; Sanderson et al., 
2004: Bell et al., 2005; Magallón & Castillo, 2009). 
Importantly, more recent efforts to date the origin of 
the angiosperms have converged on estimates that are 
between 180 and 140 Ma. Some of these recent 
estimates are only five to 10 million years older than 


518 


Annals of the 
Missouri Botanical Garden 


A 


to superrosids (Fig. 2B) 
Solanum 3 


Cuscyta expats y, > 
vy 
yE Giseus OnO y 0 
Ipomoea (= 
o 
Antirrhinum ates tin a. d 
Aucuba ti a 2 
Hafan: a Scaevola > haliüm oO : 
Angthum 
Panax Lonicera 
Hex Rhododendron 


AS —Berberidopsidales 
Portulaca 


Opuntia 


ABK 


Frankenia 


Brosop hahaa 
| Ditleniaceae 
sron Gunnerales 


3. basal 


heen =" À eudicots 


Tinh aoe 
maama mena 7 — Monsener] Santalales 


0.04 substitutions/site 


E 
H 
È 


Dilleniaceae 
Gunnerales 
Trochodendron 


Figure 2. ML topology derived from genetic algorithm for rapid likelihood inference (GARLI) analysis of all available IR 
sequences in Eudicotyledoneae. —A. Overview of topology (inset) showing major clades of Eudicotyledoneae, and phylogram 


depicting relationships in basal eudicots, Gunnerales, Dilleni 


relationships among superrosids. 


the oldest angiosperm fossils (e-g., Sanderson et al., 
2004; Bell et al., 2005; Magallón & Castillo, 2009), 
although other recent studies have yielded much older 
estimates, suggesting a possible Triassic or Permian 
origin of crown angiosperms (Magallón, 2010; Smith et 
al., 2010). 

Until recently, the most taxonomically comprehen- 
sive dating analysis for the angiosperms was per- 
formed by Wikström et al. (2001). These authors, 
using nonparametric rate smoothing (NPRS) and a 
data set of 560 angiosperm 


the immense interest in using large 
angiosperm phylogenies to investigate questions in 
ecology and comparative evolution, Bell et al. (2010) 


aceae, and superasterids. —B, Phylogram depicting 


provided new estimates of the age of the angiosperms as 
well as of the major clades of angiosperms. 

Using 22 calibration points or age constraints and 
the 560-angiosperm data set of Soltis et al. (1999, 
2000), Bell et al. (2010) conducted multiple analyses 
using Bayesian Evolutionary Analysis Sampling Trees 
(BEAST) (Drummond & Rambaut, 2007), a relaxed 
clock methodology that does not assume any correla- 
tion between rates, thus accounting for the potential of 
lineage-specific rate heterogeneity. In one set of 
BEAST analyses based on 36 fossil constraints, Bell 
et al. (2010) obtained an estimated age of the 
angiosperms of 199-167 Ma, which is still older than 
the age of the oldest known fossils (132 Ma; Hughes, 
1994). These results, as well as other recent dating 
studies, suggest a Late Jurassic to Early Cretaceous 
origin and initial diversification of crown group 
angiosperms (e.g., Sanderson et al., 2004; Bell et 
al., 2005). However, other recent studies suggest an 
even older age of crown group angiosperms (Magallón, 
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Figure 2. Continued. 


2010; Smith et al., 2010). Hence, these molecular 
estimates indicate that angiosperm fossils older than 
those discovered to date may exist and are awaiting 
discovery. Bell et al. (2010) also obtained the 
following age estimates for major angiosperm clades: 
Mesangiospermae (156-139 Ma); Gunneridae (core 
eudicots; 139-109 Ma); Rosidae (132-118 Ma); 
Asteridae (119-101 Ma) (Fig. 3). A more complete 
set of divergence times is given in Table 1. 
Significantly, recent topologies (above) as well as 
these recent studies of divergence times also provide 
insights into Darwin’s abominable mystery—the rapid 
rise and early diversification of the angiosperms. Both 
tree topologies and estimated dates of divergence 
suggest not just one or a few major radiations in the 
angiosperms, but many successive rapid radiations. 
For example, a series of recent studies, many based on 
complete plastid genome data sets, indicate rapid 
radiations throughout the diversification of major 
groups of angiosperms, including the lineages of 


Mesangiospermae (Jansen et al., 2007; Moore et al., 
2007), the lineages of Pentapetalae (Moore et al., 
2010), and within subclades of core eudicots, such as 
Rosidae (Wang et al., 2009) and Saxifragales (Jian et 
al., 2008). 


Tue RISE OF ANGIOSPERM-DOMINATED FORESTS AND 
ASSOCIATED CODIVERSIFICATION EVENTS 


Plastid phylogenomics revealed that Rosidae are 
divided into the Malvidae and Fabidae clades and 
split rapidly into several major lineages over a period 
of less than 15 million years, perhaps as quickly as 
four to five million years (Wang et al., 2009). 
Estimates for the age of crown group Rosidae ranged 
from 115-93 Ma (Late Aptian to Early Turonian), in 
the Early to Late Cretaceous, followed by rapid 
diversification into the Fabidae and Malvidae crown 
groups around 112-91 Ma (Albian to Coniacian) and 
109-83 Ma (Cenomanian to Santonian), respectively 
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(Wang et al., 2009). These estimates of the timing of 
the rapid diversification of these rosid lineages are 
comparable to published values based on molecular 
estimates from broad angiosperm surveys (Wikström 
et al., 2001; Davies et al., 2004; Magallón & Castillo, 
2009; Bell et al., 2010). For example, Wikström et al. 
(2001) provided an estimate of 117-108 Ma (their 
node 15), and Davies et al. (2004) estimated ca. 115- 
110 Ma. 

Wang et al. (2009) proposed that the bursts in 
diversification within the rosids correspond to the 
rapid rise of angiosperm-dominated forests (see Crane, 
1987; Upchurch & Wolfe, 1993). In fact, woodiness is 
particularly prevalent within the rosid clade. Families 


represents mean age estimates obtained from the 95% posterior density 


in the Fabidae include most of our temperate, as well 
as many tropical, trees (e.g., Betulaceae, Casuarina- 
ceae, Clusiaceae, Euphorbiaceae, Fabaceae, Faga- 
ceae, Juglandaceae, Moraceae. Ochnaceae, Rhizo- 
phoraceae, Rosaceae, Salicaceae, and Ulmaceae). The 
Malvidae include a number of subclades with 
important forest trees, such as subclades representing 
Malvales, Sapindales, Brassicales, and Myrtales. 
Malvales and Sapindales comprise key tropical forest 
elements, including Rutaceae, Meliaceae, Sapinda- 
ceae, Simaroubaceae (Sapindales), and Malvaceae 
and Dipterocarpaceae (Malvales). Myrtales also 
comprise important forest elements in the families 
Myrtaceae, Melastomataceae, and Combretaceae. 
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Table 1. Estimated ages for major angiosperm crown clades. Clade numbers refer to numbered nodes in figure 1 from Bell 
et al. (2010). For those clades that have been named, we have provided clade names from either Cantino et al. (2007) or 
Angiosperm Phylogeny Group III (2009). BEAST analyses were estimated using an uncorrelated lognormal (UCLN) model and 


36 fossil constraints (see Bell et al., 2010). 


Clade 
I Angiospermae 
2 
3 Mesangiospermae 
4 
5 Magnoliidae 
6 
7 
8 
9 Eudicotyledoneae 
10 
ll 
12 
13 
14 Gunneridae 
15 Pentapetalae 
16 Superasterids 
17 
18 
19 Asteridae 
20 
21 Core asterids 
22 Superrosids 
23 Rosidae 
24 


Wikström et al. (2001) BEAST 
158-179 183 (167—199) 
153-171 73 (160-187) 

+ 146 (139-156) 
+ 140 (128-140) 
122-132 122 (108-138) 
127-134 119 (100-138) 
108-113 118 (107-133) 
140-155 156 (146-168) 
131-147 130 (123-139) 
130-144 129 (116-143) 
128-140 125 (110-138) 
124-137 134 (120-145) 
123-135 127 (109-139) 
116-127 127 (109-139) 
114-124 121 (111-124) 
104-111 120 (112-131) 
106-114 121 (113-129) 
* 114 (107-122) 
102-112 110 (101-119) 
114-125 108 (99-116) 
107-117 100 (92-109) 
111-121 128 (120-135) 
108-117 125 (118-132) 
95-101 116 (108-121) 


OOO E o o aaaaaaaaaaaaaassssessososososossstssssststltl 


* Node not compatible with inferred tree. 


The diversification of rosids is closely congruent in 
geologic time with a number of other major diversi- 
fication events. For example, the diversification of 
major ant lineages is attributed to the “rise in 
angiosperm-dominated forests” (Moreau et al., 2006: 
103) and corresponds to the time period estimated 
here for the rosid radiation. This time period also 
corresponds to the radiation of other major herbivores, 
such as beetles and hemipterans (Farrell, 1998; Wilf 
et al., 2000). Diversification in amphibians is 
estimated to have occurred slightly later (85-80 Ma), 
although it is similarly attributed to the rise of 
angiosperm forests (Roelants et al., 2007}—in fact, 
82% of amphibian species live in forests. The 
majority of living ferns similarly resulted from a 
Cretaceous diversification (initiated ca. 100 Ma) 
coupled with the rise of angiosperm forests; diver- 
gence time estimates suggest that ferns diversified “in 
the shadow of angiosperms” (Schneider et al., 2004: 
553). Similarly, the major splits underlying the 
diversification of the extant lineages of placental 
mammals occurred in a similar time frame, from 100- 
85 Ma (Bininda-Emonds et al., 2007). The rise of all 
of these lineages appears to have closely tracked the 
tise of angiosperm-dominated forests. Most of these 


key forest lineages occur within the Rosidae. Hence, 
the radiations detected in Rosidae largely represent 
the rapid rise of angiosperm-dominated forests and 
associated codiversification events that have pro- 
foundly shaped much of the current terrestrial 
biodiversity (Wang et al., 2009). 


Routine SEQUENCING OF COMPLETE NUCLEAR GENOMES 


Next-generation sequencing has made it possible to 
sequence the entire nuclear genome much more 
rapidly and inexpensively than just a few years ago. 
Still, such comprehensive sequencing of angiosperm 
genomes has been limited mostly to crops and model 
plants (e.g., Arabidopsis thaliana (L.) Heynh. [Arabi- 
dopsis Genome Initiative, 2000], Oryza sativa L. 
[International Genome Sequencing Project, 2005], 
Vitis vinifera L. [Jaillon et al., 2007; Velasco et al., 
2007], Carica papaya L. [Ming et al., 2008)). 
However, as nuclear genome sequencing becomes 
increasingly routine and cost-effective, it is important 
to consider which nuclear genomes to sequence. A 
broad phylogenetic perspective is crucial in the study 
of genome evolution, and this can best be obtained via 
the acquisition and analysis of a phylogenetically 
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diverse sampling of genomes. Thus, we should identify 
and focus on plant taxa that are phylogenetically 
placed to maximize our understanding of the overall 
patterns of genome evolution in plants (see Pryer et 
al., 2002; Soltis et al., 2008). 

One such phylogenetically pivotal angiosperm is 
Amborella trichopoda Baill., the sister to all other 
extant angiosperms (e.g., Soltis et al., 1999, 2000; 
Leebens-Mack et al., 2005; Jansen et al., 2007; Moore 
et al., 2007). Because all angiosperm nuclear genomes 
sequenced to date have been either monocots or 
eudicots, obtaining the nuclear genome sequence of 
Amborella will be crucial for providing a better 
understanding of the processes shaping genome and 
gene evolution on a broad scale across the flowering 
plants, as well as a better understanding of the many 
similarities and differences between model monocot 
and eudicot plants. A complete nuclear genome 
sequence of A. trichopoda will therefore be an 
exceptional resource for plant genomics (Soltis et 
al., 2008) in much the same way as the nuclear 
genome sequence of the platypus (as sister to other 
mammals) was a crucial resource for mammals 
(Warren et al., 2008). Ultimately, complete nuclear 
genome sequences of other “basal angiosperms” will 
also be important. Comparing the nuclear genomes of 
Amborella with those of other “basal angiosperms,” 
monocots, and eudicots would be of enormous value in 
helping to reconstruct genome and morphological 
evolution in early angiosperms. For example, many 
key angiosperm features, such as the flower and 
accompanying diverse pollination systems, double 
fertilization, vessel elements, diverse biochemical 
pathways, and many of the specific genes that regulate 
key growth and developmental processes, first ap- 
peared among the descendants of the first splits in 
angiosperm phylogeny (e.g., Soltis et al., 2005, 2008). 

For similar reasons, the nuclear genome of 
Aquilegia formosa Fisch. ex DC. (Ranunculaceae) is 
being sequenced. Aquilegia L. is a member of 
Ranunculales, a clade that is sister to all other 
eudicots; consequently, this genome sequence will be 
an important evolutionary reference for all eudicots. 
Aquilegia has also been used in studies of pollination, 
mating system evolution, floral development, and 
adaptive radiation (Kramer, 2009), so a complete 
nuclear genome sequence will provide a wealth of 
data for comparisons with these genomes. 

A strong argument can also be made for sequencing 
the genomes of taxa that are sister to all other lineages 
within each major clade of angiosperms, for example, 
in Gunneridae, superrosids, superasterids, as well as 
Rosidae, Asteridae, and Caryophyllales. Perhaps one 
of the most important nuclear genomes to sequence 
based on its phylogenetic position is that of Gunnera 


L. (Gunneraceae), a member of Gunnerales, the clade ` 
that is sister to all other Gunneridae. Ultimately, a 
nuclear genome sequence for at least one represen- — 
tative of each of the major angiosperm clades (perhaps 
using the 59 orders sensu Angiosperm Phylogeny — 
Group III as a guide) would provide a broad suite of ` 
phylogenetically informative reference genomes for 
use in plant biology. 


TOWARD A GREEN PLANT TREE OF LIFE 


The availability of DNA sequences from thousands — 
of taxa across a broad phylogenetic spectrum has 
motivated efforts to construct phylogenetic hypotheses _ 
that encompass much of the species diversity of green 
life. During just the past three years, phylogenetic 
analyses that include thousands or tens of thousands 
of species have become increasingly common (e.g., l 
Bininda-Emonds et al., 2007; Goloboff et al., 2009; — 
Smith et al., 2009). Establishing such a broad 
framework of evolutionary relationships across not 
only green plants, but all of life, will have profound 
implications for how we study many areas of biology. 
Considering just the green plants, a phylogenetic — 
underpinning can yield important new insights for 
comparative genomics and molecular evolution, 
developmental genetics, the study of adaptation, 
speciation, community assembly, and even ecosystem 
structure and function that are not possible with 
smaller trees. l 

A series of recent studies has demonstrated the — 
value and utility of such enormous phylogenetic trees. 
For example, trees numbering in the thousands of 
terminals have helped clarify the tempo and mode of 
molecular evolution of species and clades of flowering | 
plants in relationship to plant life history (Smith & — 
Donoghue, 2008) and have helped elucidate patterns © 
of biodiversity in the flora of South Africa (Forest et — 
al., 2007). Large trees may also help predict responses 
to environmental issues such as global climate change 
(Edwards et al., 2007; Willis et al., 2008) and the 
Success of potential biological invasions (e.g., Strauss _ 
et al., 2006; Proches et al., 2008). Several studies also 
illustrate an important new trend in tree-building 
studies. Although systematists typically think in terms 
of building trees for a particular clade, this research 
illustrates the value of building big trees for all of the 
plant taxa in a given geographic area (e.g., Webb et 
al., 2002; Kraft et al., 2007; Wright et al., 2007; 
Cavender-Bares et al., 2009; Vamosi et al., 2009). 

Several approaches have been used to build these 
comprehensive trees, including supertree methods, 
which combined smaller phylogenetic trees into 4 
single, comprehensive tree (Bininda-Emonds, 2004); a 
Supermatnix approach, which infers trees from con- 
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catenated alignments of genes with partial taxonomic 
overlap (de Queiroz & Gatesy, 2007; Smith & 
Donoghue, 2008); and a hybrid megaphylogeny 
approach (Smith et al., 2009). An advantage of 
supermatrix approaches is that branch lengths can 
be estimated from the molecular data, and branch 
lengths are necessary for maximum likelihood (ML) 
and Bayesian techniques to reconstruct character 
states as well as many different comparative analyses. 
Until recently, the largest plant phylogenies con- 
structed using a supermatrix approach included up to 
4600 species (e.g., Källersjö et al., 1999; McMahon & 
Sanderson, 2006; Smith & Donoghue, 2008). Howev- 
er, more recent studies have analyzed supermatrices 
with more than 10 times as many taxa. Illustrating the 
recent trend to build much larger trees, a parsimony 
analysis of 73,000 eukaryote taxa was recently 
completed (Goloboff et al., 2009), and ML analyses 
of ca. 13,000 (Smith et al., 2009) and ca. 19,000 
(Burleigh, in prep.) plant sequences have also been 
performed. Even larger plant trees are on the way; 
Smith et al. (in prep.) are analyzing a 50,000-taxon 
data set for green plants. Still, while the size of these 
trees is impressive, ultimately the use of these trees 
depends on their quality. In large part, this has yet to 
be assessed. 

Recently, the funding of the iPlant Tree of Life 
(iPToL) project through the National Science Foun- 
dation (NSF)-funded iPlant Collaborative affords the 
opportunity to address the grand challenge of 
constructing, analyzing, and navigating the green 
plant tree of life. The project will provide tools for the 
systematics community and a cyberinfrastructure to 
construct, navigate, and employ big trees. For 
example, character-state reconstruction and gene- 
tree/species-tree reconciliation methods cannot now 
be implemented on large trees; early goals of iPToL 
will be to seale up and build these and other tools that 
can be employed on large trees. Such tools will be of 
broad benefit to the plant biology community. 

The systematics community is now in position to 
take a true “moon shot”: iPToL will attempt to build 
the infrastructure for reconstructing a comprehensive 
phylogeny of green plants, first for 100,000 species in 
the next two years, and ultimately for all 500,000 
species. Just as some of the earliest large-scale plant 
phylogenetic studies resulted directly from coopera- 
tion of many plant systematists, this new scale of plant 
phylogenetic inference is a direct result of the 
coordinated and collaborative efforts of plant system- 
atists with computer scientists and computational 
biologists. 

However, tools alone will not be enough to complete 
this grand challenge successfully. Only ca. 75,000 
plant taxa are now represented in GenBank, and many 


of these sequences are fragmentary. To realize a 
complete green tree of life will consequently require a 
vast amount of additional sequence data, as well as 
agreement on a set of gene regions to be sequenced for 
all plants and/or genomic approaches that make it 
possible to rapidly sequence large numbers of gene 
regions on a large phylogenetic scale. 
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